Abstract

Mobile agent technology has become a new paradigm for distributed real-time systems because of their inherent advantages. In any distributed system, along with other issues, survivability and fault tolerance are vital issues for deploying mobile-agent systems. E-business becoming a prominent domain for deploying agent technology, it also faces reliability problems due to the failure of agent platform and communication link etc. The reliability is a factor that may affect the performance, availability, and strategy of mobile agent systems. In this paper, reliability issues of mobile agents, particularly in an e-business environment, are discussed. Models for mobile agent reliability have been developed, and a Shopping Consultant Agent System (SCAS) is built as an experimental mobile agent based e-business application. Reliability problems of the system have been identified, and two simple solutions namely periodic scan and forward echo are implemented. The reliability improvement gained by the solutions is evaluated according to the reliability model developed in this paper.

Keywords: E-Business, Reliability, Multi-Agent System

1. Introduction

Mobile agents have received much attention in the last decade because of their advantages in accessing distributed recourses in a low-bandwidth network. By migrating to information recourse, an agent can invoke resource operations locally, eliminating the network transfer of intermediate data and also the agent technology provide multiple advantages [1] to a distributed system like electronic business. In spite of the inherent advantages, the reliability of the agent platform and computer communication network is a factor that may affect the performance, availability, and strategy of mobile agent systems [2] and hence the deployment of such systems. Multi-agent systems that implement distributed planning and execution are highly complex systems to design and model and this complexity makes it hard to obtain the reliability of the system theoretically.

Problems such as host failure, communication failure, and loss of agent and/or their states exist as they similarly do in other distributed systems. Solutions such as replication, consensus protocols, agent rerouting, and agent persistence are yet to develop, though there are already some work done [3]. As an alternative for distributed system development, mobile agent systems must be sufficiently reliable. This paper examines reliability issues for a mobile agent based distributed system, particularly in an electronic commerce environment in which the reliability becomes extra-important, because of the money transaction is involved.

Literature survey shows that masking [4], check-pointing [5, 6], exception-handling approaches [7] and software rejuvenation and reconfigurable itinerary [8] are some of the approaches proposed and implemented partially to enhance the survivability and fault tolerance of agent based systems. Lyu et. al [9] proposed agent-server architecture to recover failed agent with a new Failure Detection and Recovery approach and proved their approach has improved the survivability and fault tolerance of an agent based system with the expense of time and space requirements. Daoud et. al in [10] studied and evaluated the reliability of a mobile agent system with respect to the network status and its conditions. The reliability of mobile agent systems can be investigated based on different factors, which is suggested as a future work in [10]. The reliability of data present in the business database in agent-based e-business system is also enhanced by using shared objects and a generic model is presented in [11].

This paper experiments with reliability issues of mobile agent technology, by developing a prototype mobile agent system named Shopping Consultant Agent System (SCAS) that is built using the Java Agent Development Environment (JADE) [12] architecture. Possible failures of the system are discussed, and two fault-tolerance measures namely periodic scan and forward echo are implemented. A simple model to evaluate the reliability of the system is derived. The reliability improvement gained by the solutions is evaluated according the model developed.

2. Reliability Concerns of Mobile Agents

Like other distributed systems, a mobile agent system may fail due to two reasons: Site failure and Communication failure. In case of a site failure, there can be two different consequences. If the mobile agent is not residing on the failing site, the mobile agent keeps alive with its state. However, if the failing site is one of the destinations of the agent, the agent must reroute its itinerary. If the mobile agent is residing on the failing site, the mobile agent will be lost. The state of the agent and computation result will also be lost. Persistence of agents is an issue specific to mobile agent system. However, there is not much new challenge, and existing techniques like logging, check-pointing, and transaction processing may be directly applied.

In case of a communication failure, the mobile agent must be informed of the failure, and it must be able to reroute its itinerary. Otherwise, it will wait indefinitely for the failed communication link to recover, and the system will be virtually dead. In short, agent persistence and agent rerouting are two of the new challenges that mobile agent systems bring to reliability research.

3. Reliability Modeling for Mobile Agents

All software may fail over time of usage due to bugs or defects in the software. Mobile agents are merely pieces of software and hence, traditional reliability modeling can easily apply. In this section, a simple model for evaluating the reliability of a mobile agent system is derived.

For simplicity, it is assumed that the mobile agent system is already in use for a period of time, and which has become stabilized. That means, a constant reliability function for a mobile agent on each host is assumed. This also implies that the failure rate of an agent on each particular host is constant.

Consider the system where an agent travels around n hosts. Suppose the failure rate of the agent running on host i is p_i, then the failure rate of mobile agent system with n hosts, P’, assuming no communication failure, would be

P’ = 1 - P(no failure of agent on each host)

= 1 - (1- p₁)(1- p₂)..…(1- p_n)

Reliability of the corresponding mobile agent system can be defined as the probability of no failure of the system. Therefore, it is simply (1 - P).

An upper bound for failure rate of the system is can be theoretically equal to one. However, in practice, the failure rate of a software system should never grow up to as high as one. As the number of hosts increases to a certain maximum value, the curve saturates at a particular value P’, which can be called as the saturated failure rate of the mobile agent system.

With this reliability model, it follows directly that a system with a low value of P’ is more reliable than one with a high value of P’. This model is applied to evaluate some reliability measures that are developed.

4. Experimental System - SCAS: Shopping Consultant Agent System

The Shopping Consultant Agent System (SCAS) is a web-based mobile agent system that provides users with information on the products for sale in an electronic marketplace. An electronic marketplace consists of hosts that sell products on the network. The system is useful to collect and compare the prices of a set of products specified by users from different seller hosts in an electronic market. It is written in the Java programming language and on top of the JADE application-programming interface (API). This system is developed to analyze the security and reliability aspects of agent-based e- business system. As security being an aspect of reliability [2], the security aspects of this system is also studied and the results of work on them have been given in [13].

4.1. System Design

SCAS implements mobile agents to retrieve product information in an electronic market for users. Each seller maintains a database that stores the prices and quantities in stock of different products available at that host.

Figure 1 Shopping Consultant Agent System Architecture

SCAS allows users to specify a set of products and the corresponding quantities they want to buy from the list. An agent is created for the user which on behalf of the user will collect price details from hosts in the e-marketplace. The itinerary of the agent is determined before the agent is launched. After the agent visits all hosts specified in its itinerary, it returns to its sender and reports the prices. The architectural components of the system are depicted in Figure 1 and the description of them is given in the following section.

4.2. Components Design and Description

The major components present in the system are designed as follows:

Client Browser initiates the action by sending requests for prices of required items by specifying the product IDs and quantities.
LauncherAgent, which resides in the home site of e-marketplace is responsible for creating a mobile agent on behalf of user, providing necessary details to mobile agent, sending it to the hosts, receiving it and present the information as a response to the user.
MobileAgent keeps a list of product IDs and a list of the corresponding quantities specified by users. It is responsible to travel around the network and collect price information for users from different hosts.
RemoteAgent, which resides in each host, provides the price details to the mobile agent when it arrives, by querying the corresponding database present in the host.
Key Server stores the public key of participating hosts and mobile agents. It provides the same details to LauncherAgent and RemoteAgent.

4. Reliability Designs of SCAS

4.1. Reliability Problems of SCAS

When the Agent Server on a particular host fails, there can be two consequences:

If the agent sent is residing on the host with the failing agent server, the agent is lost, and any query result carried by the agent will also be lost. JADE provides agent persistence or recovery automatically at least for now, so even when the failed server becomes up again, the mobile agent cannot be recovered.
If the agent is going to visit the host with the failed server, the current host of the agent will send the agent to the failed server, without asking for an acknowledgement from the receiving server. Then, the sending server would erase the copy of the agent on the original host, even though a new copy of the agent is not created on the receiving host, due to failure of the agent server.

In both cases, the agent will not return to the user, and the SCAS system will fail. And hence SCAS should be more fault-tolerant in order to deploy it in a real-time environment to handle complex actions that may involve money transactions.

4.2. Solutions to the Problems

Two measures are implemented to make SCAS more reliable that are described below:

Periodic scan by LaunchServer: To prevent an agent from migrating to a failed site and thereby losing itself, a server monitoring system is implemented. It is like a background process running on the home-site of the e-marketplace that periodically scans each host enrolled in the e-marketplace to see if the agent server is running properly on it, similarly to the “echo” by agents. If any host is found with its agent server not responding, the process will automatically restart the agent server on that host. This ensures that the agent server on each host will be at least restarted periodically. Therefore, the chance of indefinite wait of agent for a failed server would be eliminated. The LaunchServer is used to execute this process.

Forward-echo by MobileAgent: Before a Mobile Agent is transported to the next destination; the agent sends an “echo” message to the Agent Server on the next host. If there is a reply from the receiving server within a specified period, the agent transmission goes on; otherwise, the agent will mark the host as ‘unvisited’ and proceed with the next host in the itinerary. This prevents the agent mistakenly migrating to a remote site when the receiver server is not ready, which makes the agent erased without a new copy created. By the time the agent completes its visit to other sites, the failed site is expected to be restarted by periodic scan process. This approach prevents agent from waiting indefinitely for getting a reply from failure of the receiving server and also the risk of losing agent by migrating to a failed site.

These two measures are simple and easy to implement and brings a major reliability improvement. The detailed activities required by the LaunchServer, MobileAgent and RemoteAgent to implement these measures are depicted by Figure 2 and Figure 3.

Figure 2 Periodic Scan by LaunchServer

Figure 3 Forward Echo by MobileAgent

There can be other more complicated measures replication and logging of agents to further increase the reliability of SCAS which is outside the scope of this paper.

5. Evaluation of the Reliable SCAS

SCAS failed only occasionally while development and experiment. To exaggerate the failure rates of the system such that the experiment can be carried out faster, a higher failure rate for the remote servers has been simulated. For each remote server, a background job is run for every 1 minute to pick a random integer between 1 and 10. Whenever a 10 is got, the server on that host is terminated. This process is depicted in Figure 4. Therefore, the failure rate of each remote server is about 1/10 per minute. The reliability measure called Failure Intensity (ROCOF), the frequency of occurrence with which unexpected behavior is likely to occur is measured for 5 operational time units where one operational time unit is considered here as 1 minute. The results are stored to represent the failure rate before reliability implementations. This is really a non-practically high failure rate, and that the saturated failure rate of the system can go up to 100%. However, this does not matter in this experiment, as the main objective here is only to evaluate how much reliability gain can be earned from the measures implemented.

Figure 4 Simulate Failure

To implement the reliability enhancements proposed, the home-site is started with a process that would scan all the remote servers for every 5 minutes (as the failure is simulated and considered for 5 operational time units) and if the remote server is down, it restarts it. At this scenario, the failure simulation process is made to fail the server and count the failure only if it is down so that the Failure Intensity can be measured properly. The results are stored to represent the failure rate after reliability implementations. The data collected are tabulated in table 1.

Figure 5 Reliability Curves of SCAS

Table 1 Failure Rate after and Before Reliability Enhancements

No. of Hosts	Failure Rate Before Rel Impl	Failure Rate after Rel Impl
1	20	20
2	52	36
3	52	36
4	61.6	36
5	61.6	36
6	69.28	36
7	69.28	48.8
8	75.424	48.8
9	80.3392	48.8
10	84.27136	48.8
11	87.417088	48.8
12	89.9336704	48.8
13	89.9336704	48.8
14	91.94693632	48.8
15	91.94693632	48.8
16	93.55754906	48.8
17	93.55754906	48.8
18	94.84603924	59.04
19	94.84603924	59.04
20	95.8768314	67.232
21	97.52609884	67.232
22	98.5156593	73.7856
23	98.81252744	73.7856
24	98.5156593	79.02848
25	99.05002195	79.02848
26	99.24001756	79.02848
27	99.24001756	79.02848
28	99.39201405	79.02848
29	99.51361124	83.222784
30	99.61088899	83.222784
31	99.68871119	83.222784
32	99.75096896	83.222784
33	99.80077516	86.5782272

From the results plotted in Figure 5, neglecting the final bursts, the saturated reliability for the system without reliability implementation is 100%, while that for the system with reliability implementation is decreased to only about 49%. Therefore, there is a 51% increase in reliability. However, the saturated reliability of 49% is still a high value, despite the fact that the failure rate of the servers has been intentionally increased. The time period for failure simulation namely the operational time unit and the time period for periodic scan can be made as configurable parameters to suit the applications requirement.

6. Conclusions and Future Work

The theoretical model developed in this paper is used to evaluate the reliability measures implemented in the experimental agent based e-business application developed. Operational time unit is chosen as one minute to simulate failure and the evaluation shows that a reliability gain of 51% is achieved when the periodic scan is started for every 5 operational time units. But if the server hosting the agent fails, then fault tolerance is based on the persistence mechanisms provided by the agent platform.

The reliability measure used in the evaluation is Failure Intensity namely ROCOF, and other reliability measures such as MTTF, MTBF can also be used to further evaluate and enhance the reliability of agent-based e-business systems, which is being carried out as active research work. Future research direction can be the configuration of parameters to suitable values that can be fixed automatically according to the applications requirement.

References

[1] B. Danny, Lange and Mitsuru Oshima. "Seven Good Reasons for Mobile Agents", Communications of the ACM, p.88 - 89, 1999 Mar.

[2] Kenneth P. Birman. Building Secure and Reliable Network Applications. Manning, 1996.

[3] Walsh, Tom, Paciorek, Noemi Wong and David. “Security and Reliability in Concordia”. In Proceedings of the 31st Annual Hawaii International Conference on System Sciences (HICSS31), 1998.

[4] S. Pleisch and A. Schiper, “Fault-Tolerant Mobile Agent Execution,” IEEE Trans. Computing., vol. 52, no. 2, pp. 209–222.

[5] M. Dalmeijer, “A Reliable Mobile Agents Architecture,” Proc. 1st Int’l Symp. Object- Oriented Real-Time Distributed Computing, IEEE CS Press, 1998, pp. 64–72.

[6] T. Osman, W. Wagealla and A. Bargiela, “An Approach to Rollback Recovery of Collaborating Mobile Agents,” IEEE Trans. Systems, Man and Cybernetics, Part C, vol. 34, no. 1, pp. 48–57.

[7] S. Pears, J. Xu and C. Boldyreff, “Mobile Agent Fault Tolerance for Information Retrieval Applications: An Exception Handling Approach,” Proc. 6th Int’l Symp. Autonomous Decentralized Systems, IEEE CS Press, 203, pp. 115–122.

[8] L.M . Silva, V. Batista and J.G. Silva, “Fault-Tolerant Execution of Mobile Agents,” Proc. Int’l Conf. Dependable Systems and Networks, IEEE CS Press, 2000, pp. 135–143.

[9] R. Michael. Lyu, Xinyu Chen and Tsz Yeung Wong, Design and Evaluation of a Fault-Tolerant Mobile-Agent System, 2004 IEEE INTELLIGENT SYSTEMS, IEEE Computer Society.

[10] Mosaab Daoud and Qusay H. Mahmoud, Reliability Estimation of Mobile Agent Systems using the Monte Carlo Approach, Proceedings of the 19th International Conference on Advanced Information Networking and Applications (AINA’05), 2005 IEEE.

[11] A. Kannammal, V.Ramachandran and N.Ch.S.N Iyengar, “An Enhanced Secure and Scalable Model for Enterprise Applications using Automated Monitoring”, Communicated for publication in Int’l Journal of Computer Systems and Engineering”, Australia.

[12] Java Agent Development Environment (JADE), http://jade.tilab.com

[13] A. Kannammal, V. Ramachandran and N Ch S N. Iyengar, Secure Mobile Agent System for E-Business Applications, in Proc. 4^th ACS/IEE International Conference on Computer Systems and Applications, March 8-11, 2006, IEEE Computer Society Press . (Accepted for Presentation & Publication)

[14] FIPA, Federation for Intelligent Physical Agents, http://www.fipa.org.

Technical College - Bourgas,

A. Kannammal*, V. Ramachandran**, N.Ch.S.N. Iyengar***

Abstract

3. Reliability Modeling for Mobile Agents

A. Kannammal, V. Ramachandran, N.Ch.S.N. Iyengar